Energy Load Forecast in the Czech Republic: An Artificial Neural Networks Approach
Author
Camilo Gómez and Silvester van Koten
Published
July 11, 2023
Code
# Let's import packagesimport pandas as pdimport matplotlib.pyplot as pltimport numpy as npimport seaborn as snsimport tensorflow as tffrom prettytable import PrettyTableimport plotly.express as pximport plotlyimport plotly.io as piopio.templates.default ="seaborn"pio.renderers.default ="plotly_mimetype+notebook_connected"import osimport kerasfrom keras import modelsfrom keras import layersfrom keras import regularizersfrom sklearn.preprocessing import StandardScaler, MinMaxScalerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_errorfrom sklearn.metrics import mean_absolute_errorfrom sklearn.metrics import mean_absolute_percentage_errorsns.set() # make plots nicer
1 Introduction
Accurate electricity load forecasting has become a crucial factor in the efficient operation of power systems worldwide. It helps utilities, grid operators, policymakers, and market participants to make informed decisions regarding generation planning, resource allocation, pricing, and demand-side management. Traditionally, electricity load forecasting is built using stochastic time series models. For instance, the European Network of Transmission System Operators (entso-e) uses the Mid-Term Adequacy (MAF) method, which assumes a polynomial relation between temperature and energy load (citation). However, recent works have implemented artificial intelligence methods to predict electricity load and prices reaching a lower mean absolute percentage error than linear methods (Behm, Nolting, and Praktiknjo 2020). Behm, Nolting, and Praktiknjo (2020) propose an artificial neural network (hereafter, ANN) to create synthetic electricity load profiles for Germany, Sweden, Spain, and France in the 2025 scenario using meteorological and calendric data. In this paper, we adapt Behm, Nolting, and Praktiknjo (2020)’s method to the Czech Republic market to answer three questions:
What would have been the energy load profile in the Czech Republic in a scenario in the absence of COVID-19 in 2020 and 2021?
What would have been the energy load profile in the Czech Republic in a scenario in the absence of the Russo-Ukrainian War in 2022 and 2023?
What will be the forecast for the energy profile in 2025 in the Czech Republic?
The past three years have presented significant challenges for the energy industry in Europe. First, the COVID-19 pandemic represented perhaps the last century’s most significant global economic shock. It notably affected production, distribution, and consumption in economic activity. Given that the energy industry runs through this entire chain, the outcome in the absence of the COVID-19 pandemic is worth considering. Second, the Russian-Ukrainian War represents a regional shock directly affecting the European energy market, given Russia’s role in energy production. Studying how the War has affected the energy burden in the Czech Republic is relevant because it provides insights into how public policy might react in this and similar scenarios. Finally, it is interesting to look at how these two significant events have affected the medium-term prediction of the energy profile and compare it to the scenario without them.
2 Artificial Neural Network Approach
We follow the method developed by Behm, Nolting, and Praktiknjo (2020)…
3 Data
Code
# data import (daily)data_complete = pd.read_csv('final_db.csv')# data import (hourly)data_complete2 = pd.read_csv('final_db2.csv')data_complete2 = data_complete2.drop('Unnamed: 0', axis=1)#Only consider before 2023 and after 2014data_complete2 = data_complete2[data_complete2['Date'] <'2023-01-01']load_mean = data_complete2['load'].mean()load_sd = data_complete2['load'].std()#print(data_complete2)
We collected hourly electricity load data load from the ENTSO-E Transparency Platform (2022) and meteorological data from the Climate Change Service (C3S) (2023) for the Czech Republic for 2015-2022. Figure 1 shows the energy load series in megawatts (MW) for this period. A noteworthy behavior in this series is seasonality, with valleys in summer and peaks in winter (with exceptions on Christmas Eve and New Year’s Eve). The average and the standard deviation are 7458 MW and 1283 MW for the period.
Code
load_plot = px.line(data_complete2, x='Date', y='load')load_plot.update_xaxes(rangeslider_visible=True)load_plot.update_yaxes(title='Energy load in megawatts (MW)')load_plot.show()
The Meteorological data includes the eastward and northward wind speed at 10 meters above the surface, dew point temperature at 2 meters above the surface, temperature at 2 meters above the surface, surface pressure, and total precipitation. Table 1 shows the descriptive statistics for these variables.
Code
climate_statistics = data_complete2[['u10', 'v10', 'd2m', 't2m', 'sp', 'tp']].agg(['mean', 'std', 'min', 'max'])# Create a new DataFrame for the statisticsstatistics_df = pd.DataFrame({'Variable': climate_statistics.columns})# Add average, standard deviation, minimum, and maximum columnsstatistics_df['Average'] = climate_statistics.loc['mean'].valuesstatistics_df['Standard Deviation'] = climate_statistics.loc['std'].valuesstatistics_df['Minimum'] = climate_statistics.loc['min'].valuesstatistics_df['Maximum'] = climate_statistics.loc['max'].values# Display the table#print(statistics_df.round(2))
Table 1: Descriptive statistics for meteorological variables in the Czech Republic, 2015-2022
Variable
Unit
Average
Standard Deviation
Minimum
Maximum
Eastward wind speed at 10 meters above the surface
Meters per second (m/s)
0.90
2.54
-7.65
12.54
Northward wind speed at 10 meters above the surface
Meters per second (m/s)
0.17
1.99
-8.00
8.22
Dew point temperature at 2 meters above the surface
Degrees Celsius (°C)
5.04
6.51
-21.63
20.14
Temperature at 2 meters above the surface
Degrees Celsius (°C)
10.45
8.47
-20.66
37.16
Surface pressure
Pascals (Pa)
98320.22
830.02
94125.44
101187.94
Total precipitation
Meters (m)
0.00
0.00
-0.00
0.01
4 Results
In this section we present the results of our ANN model. We use the dataset from 2015 to 2018 as training set with 20 \% as test set. We use 2019 as validation set to check overfitting in the model. In Section 6 we discuss all the models specifications we considered comparing their prediction metrics. We chose the model specification that provides the lower mean absolute percentage error, i.e. 2.54 \%. The model has 4 layers, 120 epochs, and a batch size of 32. We use RMSprop optimizer Hinton (2012) with default learning rate of 0.001.
Code
# select rows with date earlier than 12/12/2018database2 = data_complete2[data_complete2['Date'] <='2018-12-31']#Drop missing values (?) potential issue! Checkdata2 = database2.dropna().copy()# select rows with date later than 12/12/2018data2_2019_2022 = data_complete2[data_complete2['Date'] >'2018-12-31']#Drop missing valuesdata2_2019_2022 = data2_2019_2022.dropna().copy()# train test split# data_X is a dataframe with all columns of 'data' dropping 'Date' and 'load' columns.# data_y is a series with only column 'load' of 'data'.data2_X, data2_y = data2.drop(["Date", "load", "price", "load_forecast"], axis =1), data2.load#drop missingdata2_X.dropna(inplace=True)data2_y.dropna(inplace=True)#'train_test_split()' function is to split the predictor variables and the target variable into training and testing sets# 20% of the data will be used for testing# 42 is the random seeddata2_train_X, data2_test_X, data2_train_y, data2_test_y = train_test_split( data2_X, data2_y, test_size =0.2, random_state =42)# normalize the data2scaler2 = MinMaxScaler() #define scaling methodscaler2.fit(data2_train_X) #intialize scalerdata2_train_X = scaler2.transform(data2_train_X)data2_test_X = scaler2.transform(data2_test_X)
Code
# create a NN# We create a model architecture using Keras Sequencial API. It has four layers. The first one is densely connected with 64 units, a ReLU activation function, and input shape of data_train. The 2-4 are also densely connected with 64 units. The final layer is single unit layer with no activation function. The output is a continuous value.# Compile method is used to specify the optimizer, loss function, and evaluation metrics for the model. We're using mean squared error for this purpose.def build_model2(): model = models.Sequential() model.add(layers.Dense(64, activation ='relu', input_shape = (data2_train_X.shape[1],))) model.add(layers.Dense(64, activation ='relu')) model.add(layers.Dense(64, activation ='relu')) model.add(layers.Dense(1)) model.compile(optimizer ='rmsprop', loss ='mse', metrics = ['mean_absolute_error','mean_squared_error','mean_absolute_percentage_error'])return model
Code
model_hourly6 = build_model2()
4.1 The impact of COVID-19 on energy load profile in the Czech Republic
In this section we answer what would have been the energy load profile in the Czech Republic in a scenario in the absence of the COVID-19 breakdown in 2020 and its latter effect in 2021?
To this purpose we use the ANN model described in the previous section using the actual input set for 2020. Given that our model was built using historical data prior the COVID-19, this exercise allows us to get a counterfactual series on how the energy load would be without the COVID-19.
The following figure shows both the predicted and actual series of the energy load in megawatts (MW) for the Czech Republic in 2020. The first result we want to highlight is that both series behave similarly until the beginning of the state of emergency in the Czech Republic on March 12, 2020. However, after mid-March, the predicted series shows higher values than the actual ones. Naturally, this behavior is due to the lockdown measures that affected all economic activity. According to the Czech National Bank, these measures had a negative impact on all industries, with the largest effect on services (especially tourism and restaurants), transport, and automobile production (Róbert et al. 2020). The second result is that, following the lifting of most restrictions in June, the two series began to gradually converge. Overall, we note that the measures to control the spread of COVID-19 in the Czech Republic significantly reduced the energy load in the March-June period of 2020 but had little effect in subsequent months.
Code
# Reshape pred2_2019 to match the shape of test2_date_2019pred2_2020 = np.reshape(pred2_2020, (-1,))# Concatenate actual and predicted values into a single DataFramedf = pd.DataFrame({'Date': test2_date_2020,'Actual': test2_2020_full['load'],'Predicted': pred2_2020})# Plot the actual and predicted valuespred_plot_2020 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})# Set plot title and axis labelspred_plot_2020.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')pred_plot_2020.update_xaxes(rangeslider_visible=True)# Show the plotpred_plot_2020.show()
For 2021, we note that, on average, actual values are slightly lower than expected. The average energy load for the actual values is 7554 MW (sd. 1251.8) and for the predicted values is 7750 MW (sd. 1270.3). Why? The economy adjusted to new conditions and now it’s using less resources as expected before?
# Reshape pred2_2019 to match the shape of test2_date_2019pred2_2021 = np.reshape(pred2_2021, (-1,))# Concatenate actual and predicted values into a single DataFramedf = pd.DataFrame({'Date': test2_date_2021,'Actual': test2_2021_full['load'],'Predicted': pred2_2021})# Plot the actual and predicted valuespred_plot_2021 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})# Set plot title and axis labelspred_plot_2021.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')pred_plot_2021.update_xaxes(rangeslider_visible=True)# Show the plotpred_plot_2021.show()
4.2 The impact of the Russo-Ukrainian War on energy load profile in the Czech Republic
# Reshape pred2_2022 to match the shape of test2_date_2022pred2_2022 = np.reshape(pred2_2022, (-1,))# Concatenate actual and predicted values into a single DataFramedf = pd.DataFrame({'Date': test2_date_2022,'Actual': test2_2022_full['load'],'Predicted': pred2_2022})# Plot the actual and predicted valuespred_plot_2022 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})# Set plot title and axis labelspred_plot_2022.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')pred_plot_2022.update_xaxes(rangeslider_visible=True)# Show the plotpred_plot_2022.show()
5 Discussion
References
Behm, Christian, Lars Nolting, and Aaron Praktiknjo. 2020. “How to model European electricity load profiles using artificial neural networks.”Applied Energy 277: 115564.
#Try several epochs and batch_size specifications#rule of thumb is to start with a value that is 3 times the number of columns in your datamodel_hourly = build_model2()model_hourly.fit(data2_train_X, data2_train_y, epochs =27, batch_size =16, verbose =0)model_hourly2 = build_model2()model_hourly2.fit(data2_train_X, data2_train_y, epochs =45, batch_size =16, verbose =0)model_hourly3 = build_model2()model_hourly3.fit(data2_train_X, data2_train_y, epochs =90, batch_size =16, verbose =0)model_hourly4 = build_model2()model_hourly4.fit(data2_train_X, data2_train_y, epochs =100, batch_size =16, verbose =0)model_hourly5 = build_model2()model_hourly5.fit(data2_train_X, data2_train_y, epochs =120, batch_size =16, verbose =0)
Code
##Seems that around 100 epochs is the optimal size, let's see if there are changes on batch_size#32 is a rule of thumb and a good initial choice, powers of 2model_hourly6 = build_model2()model_hourly6.fit(data2_train_X, data2_train_y, epochs =120, batch_size =32, verbose =0)model_hourly7 = build_model2()model_hourly7.fit(data2_train_X, data2_train_y, epochs =120, batch_size =64, verbose =0)model_hourly8 = build_model2()model_hourly8.fit(data2_train_X, data2_train_y, epochs =120, batch_size =128, verbose =0)model_hourly9 = build_model2()model_hourly9.fit(data2_train_X, data2_train_y, epochs =120, batch_size =512, verbose =0)
Figure 2 shows the forecast of our model and the actual values for the energy load in megawatts (MW) for the Czech Republic for 2019. In general, we observe an accurate fitting in the validation data.
Code
# Reshape pred2_2019 to match the shape of test2_date_2019pred2_2019 = np.reshape(pred2_2019, (-1,))# Concatenate actual and predicted values into a single DataFramedf = pd.DataFrame({'Date': test2_date_2019,'Actual': test2_2019_full['load'],'Predicted': pred2_2019})# Plot the actual and predicted valuespred_plot_2019 = px.line(df, x='Date', y=['Actual', 'Predicted'], labels={'value': 'Load, MW'})# Set plot title and axis labelspred_plot_2019.update_layout(xaxis_title='Date', yaxis_title='Energy load in megawatts (MW)')pred_plot_2019.update_xaxes(rangeslider_visible=True)# Show the plotpred_plot_2019.show()
Code
## The goal is to predict 2019-2024. Let's take subset 2015-2018 to train the model.# select rows with date earlier than 12/12/2018data = data_complete[data_complete['Date'] <='2018-12-31']# select rows with date later than 12/12/2018data_2019_2022 = data_complete[data_complete['Date'] >'2018-12-31']
Code
#Add holidays data to the model
Code
#Database 1 (Daily)# train test split# data_X is a dataframe with all columns of 'data' droping 'Date' and 'load' columns.# data_y is a series with only column 'load' of 'data'.data_X, data_y = data.drop(["Date", "load"], axis =1), data.load#'train_test_split()' function is to split the predictor variables and the target variable into training and testing sets#20% of the data will be used for testing# 42 is the random seeddata_train_X, data_test_X, data_train_y, data_test_y = train_test_split( data_X, data_y, test_size =0.2, random_state =42)
# fit method is used to train the model on the training data by minimizing the loss function using the specified optimazer. Parameters:#epochs is the number of times to iterate over the entire training dataset during training.#batch_size is the number of samples to use in each gradient update.#verbose =0 mean no output will be printed during training.#model_daily = build_model()#model_daily.fit(data_train_X, data_train_y,# epochs = 46, batch_size = 16, verbose = 0)
<keras.callbacks.History at 0x226d752fc70>
Code
<keras.callbacks.History at 0x226d528c700>
Code
#This line evaluates the performance of the trained NN model on the test data using evaluate method. It takes the test predictor variables (data_test_X) and the test target variable (data_test_y) as input arguments, and returns the values of the specified metrics. test_mse_score represents the mean squared error (MSE) between the model's predicted values and the actual target values on the test data. mae_score represents the mean absolute error (MAE) between the predicted and actual target values on the test data.#scores = model_daily.evaluate(data_test_X, data_test_y)#print(scores)
# Check indicators to compare models. Absolute error, mean squared error. How to compare models.# Check other specifications of the ANN. Epochs try several models.# 1. What happens if we change data from hourly to daily in the quality of predictions?# 2. Synthetical year for COVID and also Russian-Ukrainian War. Counterfactual.# 3. What happens now in 2023?# 4. Compare CEPS.cz data; we should talk to them - Silvester.
"""# Extract the first date and hour and convert it to datetime data typeentso_data_2022['Date'] = pd.to_datetime(entso_data_2022['Date'].str.split(' - ').str[0], format='%d.%m.%Y %H:%M')print(entso_data_2022)"""
"\n# Extract the first date and hour and convert it to datetime data type\nentso_data_2022['Date'] = pd.to_datetime(entso_data_2022['Date'].str.split(' - ').str[0], format='%d.%m.%Y %H:%M')\n\nprint(entso_data_2022)\n"
Code
"""# group by the date (without the time component) and calculate the mean valueentso_daily_data_2022 = entso_data_2022.groupby(entso_data_2022['Date'].dt.date)['Forecast'].mean().reset_index()# convert the date column to datetime type#entso_daily_data['Date'] = pd.to_datetime(entso_daily_data['Date'])# set the date column as the index for each dataframe#entso_daily_data.set_index('Date', inplace=True)print(entso_daily_data_2022)"""
"\n# group by the date (without the time component) and calculate the mean value\nentso_daily_data_2022 = entso_data_2022.groupby(entso_data_2022['Date'].dt.date)['Forecast'].mean().reset_index()\n\n# convert the date column to datetime type\n#entso_daily_data['Date'] = pd.to_datetime(entso_daily_data['Date'])\n\n# set the date column as the index for each dataframe\n#entso_daily_data.set_index('Date', inplace=True)\n\nprint(entso_daily_data_2022)\n"
"""# calculate the difference between actual and our predicted valuesdiff_1 = combined_2022['load'] - combined_2022['pred']diff_2 = combined_2022['load'] - combined_2022['Forecast']# plot predicted valuesplt.plot(test_date, diff_2 , label='entso Predicted')# plot the differenceplt.plot(test_date, diff_1, label='Our Prediction')# set plot title and axis labelsplt.title('Difference between Actual and Predicted Load 2022')plt.xlabel('Year - Month')plt.ylabel('Load Difference, MW')# show legendplt.legend()#save the plot#plt.savefig('prediction_2022.png')# display the plotplt.show()"""
"\n# calculate the difference between actual and our predicted values\ndiff_1 = combined_2022['load'] - combined_2022['pred']\ndiff_2 = combined_2022['load'] - combined_2022['Forecast']\n\n# plot predicted values\nplt.plot(test_date, diff_2 , label='entso Predicted')\n\n# plot the difference\nplt.plot(test_date, diff_1, label='Our Prediction')\n\n# set plot title and axis labels\nplt.title('Difference between Actual and Predicted Load 2022')\nplt.xlabel('Year - Month')\nplt.ylabel('Load Difference, MW')\n\n# show legend\nplt.legend()\n\n#save the plot\n#plt.savefig('prediction_2022.png')\n\n# display the plot\nplt.show()\n"